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Abstract — A linear algebra system is represented in matrix form 
as Ax = b where A is a given matrix, b a given vector and x 
the unknown variable to be found. Steepest Descent and Conjugate 
Gradient are well known iterative methods to solve this system. 
These methods require matrix A to be symmetric and positive- 
definite. This article introduces two iterative methods: Method 1 
and Method 2 which bear resemblances to the aforementioned two 
methods while not requiring A to be symmetric nor positive-definite. 

Index Terms — steepest descend, conjugate gradient, iterative 
method, linear algebra system, nonsymmetric, nonpositive-definite 



I. Iterative methods 

Iterative methods aim to solve Ax = b by a sequenc of 
guesses. The new guess of the solution x n usually depends on 
the last guess x n -\ 

%n — f (%n—l) 

where /(..) is the rule for guessing 

The guessing process stops when Ax n ~ b or the residual 
\r n \ = \b-Ax n \ ~0 

The key to iterative methods is to formulate /(..) such that 
the guesses converge quickly to the solution 



II. Method 1 
The guessing rule for Method 1 is 

X n f (x n —\) = X n —i -\- OL n — \T n —\ 

i.e. the new guess x n is dependent on the last guess x n -\ and 
the last residual r n _i . Here, a n _i is a scalar 

We start with the initial guess xq. Its residual ro = b — Axq 

The next guess, by the rule, x\ — xo + a^r^ 

What is the value of that best approximates x\l In other 
words, what value of ao can minimize the residual \n \ incurred 
by 

7*1 = b — Ax i = b — A(xo + aoro) = tq — olqAtq 

M \ro - a Ar \ \a Ar - r \ 

Minimizing |r*i| is equivalent to minimizing |aoAro — tq\ 
Figurell.l uses vector diagram to show the possible position 

for a Ar - r 

It is observed in the figure that \aoAro — vq\ has its smallest 

value when aoAro — r$ is orthogonal to Aro, i.e. 

(Ar ) T (a Ar - r ) = 



a 



(Ar ) T r 
(Ar ) T (Ar ) 



Hence, a n _i - ^r n -!)T (Ar n li) 

Method 1 is summarized in the box below 




2Ar - r 



7Ar - r > 



Ar 2Ar 



Ar n 



7Ar n 




Figure ILL Illustration of the best position for aoAro — ro by vector diagram 



1) Start with the first guess xo and calculate the first 
residual 

r = b - x 

2) Repeat until \r n \ c± or a certain number of 
iterations is reached 

Ot n -l ~ (Ar n _i)^(Ar n _!) 
%n ~ %n—l H~ C^n— l^n— 1 

fn. — b Ax n 



III. Method 2 

A. Guessing rule: 

Method 2 uses this guessing rule: 

x n = x n -i + a n _id n _i (1) 

The new guess is no more dependent on the last residual but 
rather on a certain vector d n _i. Our aim is to find the best a n _i 
and d n -\ to minimize \r n \ 

r n = b- Ax n = b- A (x n -i + a n _id n _i) 

r n = r n -i - (\n-\Adn-x (2) 

Moreover, r n _i = r n _2 — a n -2Ad n -2, and so on, 

r n = r n -2 - a n - 2 Ad n - 2 - OL n -\Ad n -\ = ... 

r n = r - ^7=0 a i Ad i 0) 

B. Best value of ai to minimize \r n \ 
Mimizing \r n \ is equivalent to minimizing |r n | 2 

\r n \ 2 = Oo - El^o 1 aiAdi) T (r - J27=o a i Ad i) 

= rl-2 E^o 1 «i(M) T r + E^o E?=o «i«i(M) T M(4) 

The term Y^Z® otiAdi in (3) is a linear combination of Ad{. 
Let us assume that all Adi vectors are orthogonal to each other 
so that (4) is simplified to 
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P(u n ): Projection of Uj 



space made by 



Figure III. 1 . Illustration of Gram Schmidt method 

By Larange's multiplier, minimizing |r n | 2 requires 

^-KI 2 = o 

-2(Adi) T r + 2a i (Ad i ) T Ad i = 

{Ad,) T r 



{Ad x ) T Adi 



(5) 



C. The residual r n is orthogonal to previous Adi 
Multiply both sides of (3) by (Adj) T we have 

(Ad 3 ) T r n = (Adj) T r - a t (Ad 3 ) T Ad % (6) 

• if j < n, since Adi are orthogonal to each other, (6) is 
simplified to 

(Ad 3 ) T r n = (Ad 3 ) T r - a 3 (Ad 3 ) T Ad 3 (7) 
substituting (5) into (7) for a 3 , we obtain 

(A dj ) T r n = (Adjfro - ^^-(Ad^Adj = 
Hence the residual is orthogonal to all previous Adi, i e. 
rl (Adi) = 0, Vz < n 

• if j > ri, since Adi are orthogonal to each other, (6) is 
simplified to 

(Ad 3 ) T r n = (Ad 3 ) T r 



Hence, when j = n, (Ad 3 ) r 3 
rewrite (5) as such 

{Ad 3 ) T r 3 
(Ad^Adj 



(Ad 3 ) T ro, and we can 
(8) 



D. Finding d n 

All d n must satisfy the condition that Ad n are orthogonal 
to each other. We can make use of Gram Schmidt method to 
achieve this. 

The Gram Schmidt method generates m orthogonal vectors 
d n from m given arbitrary vectors u n . The method is as follow 



1) 


do = uo 




2) 


For n = 


1 tO 771 — 1 




d n — u n 


-YTiZvMi where ft = 



Figure III. 1 illustrates how Gram Schmidt method works. In 
the figure, the space made by previous di is represented by a 
plane. Subtracting the projection of u n from itself gives a vector 
orthogonal to the plane. 

The Gram Schmidt method only generates d n such that d n 
are orthogonal to each other while we want to generate d n such 
that Ad n are orthogonal to each other. 

We make use of the rule d n = u n — J27=o Pi^i from Gram 
Schmidt method to achieve the purpose. 



Multiplying both sides of this rule by matrix A gives 

Ad n Au n — y^ ? -_Q ftiAdi 

Multiplying both sides of the last equation by (Adi) T with 
i < n and using the fact that Adi are orthogonal to each other, 
we obtain 

= (Ad x ) T (Au n ) - ^(Ad n ) T (Ad n ) 

a _ jAdjf\Aun) 
Pi - (Adi) T (Adi) 

So here is the method to generate d n such that Ad n are 
orthogonal to each other from given sets of vectors u n 



1) do uo 

2) For n = 1 to m — 1 

A V n-1 (Adj) T {Au n ) , 

— 2^i=0 (Adi) T (Adi) a% 



E. Method 2 with given u n 

If a set of vectors u n is given, method 2 is as follows 



1) Start with the first guess xq and calculate the first 
residual ro = b — Axo 

2) d u 

3) Repeat until |r n | ~ or a certain number of 
iterations is reached 

_ (Ad ri _i) T r n _i 
a n-l — (Ad n - 1 ) T (Ad n - 1 ) 

%n — %n-i + &n-id n -i (guessing rule for method 
2) 

r n = r n -i- a n _i (Ad n _i) (result from (2)) 

A V n-1 (M) T (^ n ) , 



F Convergence of method 2: 

Assume that matrix A is m x m. Hence, the dimention of A 
and the space of Ax is at most m. In Method 2, the vectors 
Adi are orthogonal basis of Ax. Therefore, there are at most m 
non-zero vectors Adi 

If ro belongs to the space Ax, i.e. ro can be expressed as a 
linear combination of Adi, 

^o = E™o 1 7i^i (9) 

To find ji, multiply both sides of the last equation by (Adi) T 
and using the fact that Adi are orthogonal to each other to obtain 

(Adi) T ro=^(Adi) T (Adi) 

{Ad % ) T r 
- {Ad,) T {Adi) 

Substituting the result for 7^ into (9), we have 

{Ad x ) T r 



ro 



Em— 1 
i=0 



{Ad % ) T {Adi) 

From (3), after m iterations we have 

-\m— 1 



Adi (10) 



r m = r - Yh=q a i Ad i (11) 

Substituting the result for ro from (10) and cti from (5) into 
(11), we have 



E 



m— 1 (Adj) T r 



Adi - E 



m— 1 (Adi) T r 



Adi = 



ui=0 (Ad l ) T (Ad t )^ L ^ ^i=0 (Ad t ) T (Adi)" 

=> Method 2 gives a solution after m iterations if ro lies in 
the space Ax 

Even if ro does not lie in the space Ax, after m iteration, the 
solution will converge to a value such that \b— Ax\ is minimized. 
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G. Choice of U{ 

i^can be chosen arbitrarily. However, if not chosen carefully, 
for example, one accidentally chooses U{ dependent on previous 
di, one will just waste one iteration because d{ genernated from 
that Ui will be 0. 

One good choice of ui is r^, i.e., set ui = ri for each iteration 
for the reason that if is zero at a certain step, we need to iterate 
no more because the solution is already reached. 

So here is Method 2 with Uj = r ? 



1) Start with the first guess x and calculate the first 
residual ro 

2) d = r 

3) Repeat until \r n \ ~ or a certain number of 
iterations is reached 

(Ad n _i) T r n _i 
«n-l - (Ad n - 1 ) T (Ad n - 1 ) 

x n x n —i -\- d n —id n —i 
r n = r n -i - a n _i (Ad n _i) 

d -r y^-l (Adj) T (Ar n ) 7 



Still, Method 2 with ^ = has a shortcoming that we need 
to remember previous d^ after each step (see equation (12)) and 
the summation in (12) is computational comsuming. 

Is it possible to eliminate the summation? 

H. Eliminate the summation in (12) 
From (2), we have 

r i+ i = n - ctiAdi => Adi = r -^f^ 

Substituting this result for (Adi) T in (12), we obtain 



En— 1 
i=0 



dn — u r 

^n— 1 rf(Aun) 



(Adi) T (Adi) di 

rJ +1 (Au n ) 



a n - a n 2^i=0 ai{Adi) T (Adi) a ^ 2^i=0 ai {Adi) T {Adi) 1 K } 

If Au n is orthogonal to previous i.e. rj (Au n ) = Vi < n 
, (13) is simplified to 



d n 



a n _i(Ad n _i)r(Ad n _i) d ^-l 



Substituting the value for a n _i from (8) into the last equation 
and simplify it to obtain 

r^{Au n ) 



d n — u r . 



-d n -! (14) 



(Ad n _i) T r n 

As seen in (14), the summation is gone. Therefore, our job is 
to choose u n such that 

Au n is orthogonal to previous r\ (i.e. Vi < n) (15) 

From (12), u n can be expressed as 

U n =d n + J27=0 Pi d i where Pi 



(Adi) T (A Uri ) 
(Adi) T (Adi) 



Au n = Ad n + Yh=o Pi Adi 

In other words, Au n is a linear combination of Adi Vi < n 
We know from III-C that r n is orthogonal to previous Adi (i.e. 
for Vi < n). Hence, r n is orthogonal to any linear combination 

of Adi Vi < n 

Therefore, because Au n is a linear combination of Ad^i < n, 

r n is orthogonal to previous Aui (i.e.Vi < n) (16) 

A possible choice of ix n to satisfy (15) is u n = A T r n . Because 
then we have 



(16) <=> rl (Am) = for Vi < n 
<=> [A T r n ) T Ui = for Vi < n 
<=> u^m = for Vi < n (using u n = A T r n ) 
<=> u^ (A T ri) = for Vi < n (using = A T ri ) 
<=> (Au n ) T Vi = for Vi < n 
<=> is orthogonal to previous <=> (15) 
We rewrite (14) when u n = A T r n as follows 

d — A T r | (^ T ^) T K^) j 1 
Here is Method 2 with i£ n = A T r n 



1) Start with the first guess xq and calculate the first 
residual ro = b — Axo 

2) d = A T r 

3) Repeat until \r n \ ~ or a certain number of 
iterations is reached 

a 1 _ (^n-l) r rn-l 
an - X (Ad n _i) T (Ad„_i) 

— %n— 1 H~ ^n— l^n— 1 
= - (Ad n _i) 



,7 _ AT , ( ATr ^) (A T r^) 



IV. Contrast of Method 1 vs Steepest Descent 
Method and Method 2 vs Conjugate Gradient 
Method 



Following are two tables showing the contrast of Method 
1 vs Steepest Descent Method and Method 2 vs Conjugate 
Gradient Method 



A. Steepest Descent Method vs Method 1 



Steepest Descent Method 
[1] 



1) Start with the first 
guess xo and calculate 
the residual 

r = b- Ax 

2) Repeat until \r n \ ~ 
or a certain number of 
iterations is reached 

a i - r -- lTW 

~ rl_,{Aru-i) 
%n — %n—l H~ Qn— l^n— 1 

T'n — b ^4x n 



Requirements: Matrix A has 
to be symmetric and 
positive definite 



Method 1 



1) Start with the first 
guess xo and calcu- 
latethe residual 

r = b- Ax 

2) Repeat until \r n \ ~ 
or a certain number of 
iterations is reached 

. v , _ (Ar w -i) T r n _i 
«n-l — (Ar n _i) T (Ar n _i) 
— %n— 1 H~ ^n— l^n— 1 

^77. = ^ AX n 



Requirements: not any 



4 



B. Conjugate Gradient Method vs Method 2 



Conjugate Gradient Method 
[1] 



Method 2 



1) Start with the first 
guess xo and calculate 
the first residual 

r = b- Ax 

2) d = r 

3) Repeat until \r n \ ~ 
or a certain number of 
iterations is reached 

T 

_ r n _ 1 r n - 1 

- ^(Adn-i) 

X n X n —\ -\~ QL n —\d n —\ 

r n = r n -i - a n -i (Ad n -i) 
r T r 



1) Start with the first guess 
xo and calculate the first 
residual 

ro = b — Axo 

2) d = A T r 

3) Repeat until \r n \ c± or 
a certain number of itera- 
tions is reached 

Qfn-l 



(Ad n _i) T (A4_!) 

— «^n— 1 H~ ^n— l^n— 1 



A T 7 



d n 

(A T r n ) T (A T r n ) 
(Ad n - 1 ) T r n _ 1 



d n - 



Requirements: Matrix A has 
to be symmetric and 
positive definite 



Requirements: not any 



V. Implementation 
Matlab implementation can be found on the author's blog [2] 



VI. Conclusion 

This article presented the mathematical background of two 
iterative methods: Method 1 and Method 2. Both methods can 
be used for any square matrix A to solve Ax = b. Theoretically, 
Method 2 gives the solution after at most m iterations where m 
is the size of matrix A. 
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